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Abstract 


Several  extended  caution  Indices  (£CIs)  have  been  Introduced 
earlier  as  a  link  between  two  distinctly  different  approaches:  one 
based  on  standard  statistics  and  the  other,  a  model-based  approach 
utilizing  Item  response  theory  ( 1RT) .  Expected  values  and  variances  of 
some  ECls  are  derived  and  their  statistical  properties  are  compared  and 
discussed.  Then,  standardized  ECls  are  introduced  and  their 
distributions  are  investigated.  It  turns  out  that  the  standardized  ECls 
fit  normal  distributions  well.  A  comparison  of  detection  rates  among 
appropriateness  measures  based  on  IRT  theory  is  carried  out  with  the 
signed-number  dataset.  There  is  no  noticeable  difference  in  their 
detection  rates  using  the  80Z  intervals. 
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Introduction 


An  increasing  number  of  researchers  have  begun  to  show  interest  in 
using  response  patterns  of  n  items  for  analyzing  performance  on  test 
scores.  By  so  doing,  more  information  is  obtainable  than  by  using  only 
traditional  total  scores.  Tatsuoka  and  her  colleagues  (birenbaum  & 
Tatsuoka,  1982a,  b;  Tatsuoka  &  Tatsuoka,  1982a)  have  demonstrated  that 
some  wrong  rules  of  arithmetic  computations  (fractions  and  signed- 
numbers)  can  produce  the  right  score  of  1  on  as  much  as  60%  of  the  test 
items.  If  many  students  apply  a  variety  of  wrong  rules  consistently 
throughout  the  test,  then  these  faulty  rules  cause  a  serious  problem  by 
violating  the  unidimensionality  assumption  of  a  dataset.  After 
rescoring  these  correct  responses  obtained  by  faulty  rules,  the  dataset 
became  nearly  unidimenslonal.  They  have  developed  several  indices  to 
detect  aberrant  response  patterns  resulting  from  consistent  application 
of  wrong  rules  (Tatsuoka  &  Tatsuoka,  1982b)  and  have  shown  one  of  them, 
the  individual  consistency  index  (ICI),  to  spot  more  than  90X  of  such 
aberrant  response  patterns  (Tatsuoka  a  Tatsuoka,  1981). 

Rudner  (1982)  investigated  the  detection  rates  of  various  personal 
indices  (norm  conformity  index,  caution  index,  personal  biserial  and 
appropriatness  measures  based  on  item  response  theory)  and  found  that 
the  indices  based  on  1RT  are  more  efficient  for  detecting  anomalous 
response  patterns  than  those  based  on  observed  item  response  and  summary 
statistics.  However,  estimating  parameters  of  IRT  models  requires  a 
substantial  number  of  subjects  while  it  is  often  impossible  to  have  such 
a  large  sample  size  in  many  classroom  settings. 
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Sato  (1975)  developed  the  caution  index  in  conjunction  with  S-P 
curve  theory  and  succeafully  used  it  for  diagnosing  students' 
performance  and  evaluating  instructional  materials  in  Japan.  Harnisch 
and  Linn  (1981)  demonstrated  its  usefulness  by  applying  it  to  a  NAEP 
dataset  (National  Assessment  of  Educational  Progress).  Although  their 
analysis  is  based  on  a  large  dataset,  their  results  show  clearly  that 
analysis  of  response  patterns  as  a  whole  provides  very  useful  information 
associated  with  Individual  differences,  curriculum  differences  and 
school  differences. 

The  concept s  of  S-P  curve  theory  and  caution  index  have  been 
extended  to  the  continuous  domain  of  IRT  models  from  the  approach  based 
on  the  discrete  summary  statistics  by  Tatsuoka  and  Linn  (1982).  They 
have  developed  five  alternative  indices  and  named  them  extended 
caution  indices  1,  2,  3,  4  and  5.  In  this  paper,  further  statistical 
properties  of  ECU,  2,  and  4  will  be  discussed  and  their  detection  rates 
will  be  compared. 

Statistical  Properties  of  Extended  Caution  Indices 
Definition  of  the  Extended  Caution  Indices 

A  group  of  extended  caution  Indices  (ECI)  has  been  Introduced  as  a 
link  between  two  distinct  approaches  of  detecting  aberrant  response 
patterns  (Tatsuoka  4  Linn,  1981) .  One  is  based  on  the  use  of  binary 
response  patterns  and  their  standard  summary  statistics  (Sato,  1975; 
van  der  Flier,  1977;  Tatsuoka  &  Tatsuoka,  1980,  1982a),  while  the  other  is 
a  model- based  approach.  In  the  latter,  the  patterns  of  probabilities 
that  are  derived  from  item  response  theory  are  utilised  in  calculating 
appropriateness  measures  together  with  observed  binary  response  patterns 
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(Wright,  1977;  Drasgow,  1978;  Levine  &  Rubin,  1979).  ECIe  are  an 
extension  of  Sato's  caution  Index  to  the  approach  using  IRT.  In  this 
section,  three  of  the  five  ECls  will  be  investigated  in  terns  of  their 
expected  values,  variances,  and  advantages  and  disadvantages. 

Let  y^j  [i-l,...,N;  j-l,...,n]  be  the  binary  score  of  subject  1  to 
item  j,  yi.  be  the  ith  row  sum,  and  y,j  the  jth  column  sum  of  the  data 
matrix  (yij)«  Let  Pjj  be  the  probability  of  subject  i  answering  item  j 
correctly,  which  may  be  based  on  the  one-,  two-  or  three-parameter 


logistic  model.  That  is, 
pij  “  cj 


1  -  ci 


1  +  exp[-Daj  (©i  -  b j) ] 


where  cj  -  0  and  aj  ■  1  for  the  one-parameter  logistic  model;  cj  «  0  for 
the  two-parameter  logistic  model.  Thus,  two  data  matrices  —  one 
comprising  observed  binary  scores  of  n  items  for  N  subjects  (yij)  and 
the  other  consisting  of  (Pj,j)  —  may  be  introduced.  We  refer  to  (yj.j) 
as  the  observed  binary  matrix  and  (Pij)  as  the  probability  matrix. 

Let  Gj  be  the  jth  element  of  a  vector  approximating  the  group 
response  curve  (GRC)  for  item  j,  and  Ti  be  that  of  the  vector  for  the 
test  response  curve  (TRC)  for  subject  i.  Then 


Ti  ■  s  j2ip‘J  • 

In  other  words,  Gj  for  item  j  and  Ti  for  subject  i  are  the  jth  column 
sum  and  the  ith  row  sum,  respectively,  of  the  probability  matrix  (Pij). 

Three  of  the  five  ECls  are  defined  as  complements  of  the  ratio  of 
two  covariances  between  various  pairs  of  row  vectors  taken  from 
the  two  matrices. 


(1) 


->■  v  -  '-  •>  ^  •  vi  »?•**«*•  ***** 
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ECIli  - 

cov(yA  ,jr.) 

c°v  (£i 

ECI2i  - 

covj/i  »  JD 

covCg.  .,£i> 

ECI4*  - 

,  COVj^i  ,  ^l) 

1  * 

cov^  ,  ^Pi) 

(2) 

(3) 


rer^l  "  (yil.  yi2» • • • »yin) »  the  vector  of  binary  score6  for  subject  i 
the  ith  row  vector, 

y.  “  (y.i.  y.2» • • • »y.n) »  the  column-sum  vector  in  the  oDserved 
ary  matrix, 

«r5i  “  (Pii,  Pi2« • • • »pin) »  the  probability  vector  from  the  ith  row 
the  probability  matrix,  and 

*  (Gf,  G2 • • • , Gn) ,  the  GRC  vector  which  is  the  column-sum  vector  of 
j).  Expression  (1)  is  defined  by  forming  the  ratio  of  the  following 
ariances:  the  numerator  is  the  covariance  of  subject  i's  response 
tern  and  the  column-sum  vector  over  n  items  in  (yjj),  and  the 
omlnator  is  the  covariance  of  the  ith  row  probability  vector  derived 
m  a  logistic  model  and  the  column-sum  vector  in  (y^j).  Expressions 
and  (3)  have  the  same  denominator,  the  covariance  of  the  GRC  vector 
the  1th  probability  vector,  and  the  numerators  are  covariances  of 
response  pattern  vector  with  the  GKC  vector  and  the  probability 
tor,  respectively. 

When^y^  consists  of  all  Is  or  Os,  the  second  terms  of  the  ECls 


ome  undetermined 
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The  expectations  of  ECI1,  ECI2  and  ECI4 

In  this  section,  the  expectations  and  variances  of  the  three  ECIs 
given  by  Equations  (1),  (2)  and  (3)  will  be  derived.  The  actual 
values  of  the  ECIs  for  subject  i  can  be  calculated  by  replacing  the  item 
and  person  parameters  with  their  estimated  values  aj,  b j  and  @i  based  on 
the  maximum  likelihood  method.  It  is  known  that  the  maximum  likelihood 
estimates  of  item  and  person  parameters  satisfy  the  likellnood 
conditions  (Lord  and  Novick,  1968)  given  in  Equations  (4). 

2  «  2 Ayij 

j-1  J  j-1  J 

n  .  n 

2  $ij  -  2  yij 

j-1  J  j-1  J 

o  A  * 

j^jPij  -  j^-jyij  . 

Since  the  ECIs  are  functions  of  the  person  parameter  the  conditional 
expected  values  and  variances  of  the  ECIs  for  a  fixed  ability  level  will 
be  Introduced.  Hereafter,  the  circumflex  on  Fjj  (and  its  ith-row  vector 
will  be  omitted  to  simplify  the  notation. 

ECU 

The  conditional  expectation  of  the  first  ECI  defined  in  Equation 
(1)  is  given  by  the  following: 

E(Ecnlei)  -  i  -  e  I  ej) 

Elcov(flt  ,,£.101)] 

-  1  -  -  cover*  ,  ^r  • 


(4) 


(5) 
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The  observed  vector  Is  a  random  vector  at  the  level  0^  ana  the 
expectation  Is  obtained  over  It.  Mow,  we  have  to  find  the  expectation  In 
the  numerator  o£  the  second  fraction,  E[cov(y^  ,  y.)|0iJ.  First, 
the  covariance  of  y ^  and  y.  is  rewritten  as  the  summation  of  the  product  of  the 
deviations : 

E[cov(yfc  ,  yi)|0i]  -  ”  Pi.Xy.j  -  P. .)  I®i]  /n 


where  pi.  is  the  ith  row  mean  of  (yij)  and  p. .  is  the  mean  of  the  row  means  or 
column  means  as  follows, 


p  -I  2  p.i-I  2  Pi 
p*.  n  j-1  N  i-1  u 

By  using  the  second  members  of  Equations  (4), 
reduces  to  the  covariance  of^j  and^.  .  Thus 


this  expectation 
,  the  conditional 


expectation  of  ECU  at  the  fixed  level  i  becomes  zero,  as  summarized  in 
Equation  (6). 

eovQk^yJ 


E(ECIllei) 


1  - 


=  0 


(6) 


The  conditional  variance  of  ECU  at  the  fixed  level  i  is 

Var(ECll |©i)  -  E[ECI1  -  E(ECll|e1)J2  .  (7) 

By  substituting  the  result  from  (6) ,  the  conditional  variance 

(7)  becomes  E(ECIl2|6i).  That  is: 

ECECIlZlei)  -  E([l  -  COV^_*^J]2|e  ) 

cov^  *,£•> 

(8) 

-  -1  +  E(C°V2Q»  frZjlOl) 

cov*Q  ,Jr~) 


where  we  have  again  used  the  fact  that  E[cov^^  ,  ^y. )  J  - 
The  numerator  of  the  last  term  of  Equation  (8),  however. 


cov  *,£•>  * 
can  be  expanded 


Co  Che  sua  of  Che  diagonal  and  off-diagonal  Cents,  and  Chen  by  applying 
Che  condiCions  given  in  EquaCions  (4),  we  obcaln  EquaCion  (9). 

^ E<[  Ji<XkJ  "  p*’)(y‘j '  p**)]2lV 

■cJ^ykj  -  pi.)2(y.j  -  p. .)2\b±] 

+  Egh  (yjtj  -  Pi.)(ykh  "  Pl.Ky.j  -  P..)(y.h  -  P..)I®1>] 

The  firsc  Cera,  Che  diagonal  pare  inside  Che  parentheses  of  the  above 
equation,  is: 

E[  2(ykj  -  Pi.)2(y.j  -  P..)2l»i] 
j-1  J 

-  3  (y.j  -  p..)2  E[(ykJ  “  Pi.)2 I®il 
j-1 

-  JjCy.j  -  p..)2ipij(i  -  Pij)  +  (Pij  -  Ti)2] 

The  second  Cera  inside  Che  parenthesis  is: 

E(J?h<ykJ  ‘  P*-)(ykh  "  pi.Xy.J  "  p..Ky.h  ~  P..)l«i) 

•  jjjh(y-J  “  p..)(y.h  -  p..)  E[(ynj  -  Pi.)l®iJ  E[(ykh  -  pi.> l®i] 

’  J?h<y*J  ~  p**)(y*h  “  p..><pij  "  TiXpih  -  Ti)  • 

Adding  Che  results  of  Che  two  expectations  gives  Equation  (10). 

1  n 

„2  EUJ2i(ykj  -  Pi.Ky.j  -  P..)]2!^) 

-  4  i  5(y.j  -  p..)(pij  -  Tt)]2  +4[“  (y.j  -  p..)2pija  -  pij)] 

n  n  j-1  J 

-  cov2£y.  ,^)  +7?  J|1<y. j  "  P..>2(Jlj2 


(9) 


(10) 
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Substituting  (10)  In  Equation  (8),  the  variance  of  ECU  becones: 


Var(ECIl) 


cov2(y, 
-1  +  _ 


•&>  +  <y.j  ‘  P..)2/“2 

cov2(Pi  ,  y.) 


2  -  p..)2 

J-l  _ 

n2cov2^Pi  ,£.) 


(ID 


ECI2 


The  conditional  expectation  of  the  second  EC1  Is  given  by 


E(EC12 l®i) 


cov^k 


-  i  -  EIcovQ>  .  ^)leil 


(12) 


But 

E[coy^  ,4>l«l  -  j  Et  -  Fl.XS  -  T)  |  6j 

'  P1-)(GJ  -  T,l  «il 

'  -  T1>(0J  -  T>  -  .£>  , 


where 


N  ,  n 
T  -  2  Ti/N  -  2  G*/t 
1-1  J-i  J 


By  substituting  this  result  In  Equation  (12),  we  gat  (13). 


E(ECI2|ei)  -  1  -  —  &  »  0 

cov(^,^) 


(13) 
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The  conditional  variance  of  ECX2  is  given  by  Equation  (14), 

Var(ECI2|0i)  -  E[(ECI2  -  E(ECI2))]2  l®i) 

-  E(ECI22  |0i) 


■1  +  E[cov2&k 


(14) 


The  expectation  of  the  squared  covariance  of  ^  and^G  can  be  simplified 
and  given  by  Equation  (IS). 

E[cov2(yk  ,  J*)  1 01. 3  “  cov2^  ,  G)  +  j^ff^2  (Gj  -  T)2 


(15) 


By  substituting  (15)  in  (14),  we  get  (16). 

2  (G  -  T)2aij2 
Var(ECl2|0i)  -  ±LL 


n2cov2£G  ,  ^Pj) 


(16) 


ECI4 

The  conditional  expectation  of  ECI4  is 

E(ECI4|9l)  -  1  -  (17) 

where  y^  is  a  random  variable  from  the  distribution  of  binary  responses 

to  n  items  at  the  fixed  ability  level  i.  Since  the  denominator  of  the  expected 

value,  cov  (£_  »^i)>  ls  fixed  at  level  i,  the  second  term  will  be 

simply  the  expectation  of  the  numerator  divided  by  the  covariance  of 

*nd,£i ,  Elcov^,^)|0i]/cov  (£,4^). 

Elcov^jt  ,^i)|0iJ 
’  £  “  Pi.KPi)  ~  Ti)|0i] 

"  «  j?i<PiJ  "  Ti)  E(ykj  -  PMei) 
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But  E(ykj  “  Pi.l®l)  ■  Bij  ~  Ti  because  of  Equations  (4) 
Therefore, 

E(ECI4|6i)  -  1  -  cov(Bj  »&) 

covce .  *).) 


-  1  -  Var&t> 

cov^‘^>  * 

The  conditional  variance  of  ECI4  is  given  by  Equations  (19). 

Var(ECI4|ei)  -  e[[ECI4  -  E(ECl4)]2|eij 
Substituting  the  expectation  of  ECI4  from  Equation  (18),  (19)  becomes 


(18) 


(19) 


Var(ECI4|0i) 


E 


(cov0t  •&.) 


cov 


c<>vUt  *^i> 


A  straightforward  expansion  of  the  inside  of  the  parentheses  leads  to 
Equation  (20). 


Var(ECl4  |0j)  ■  E[cov2Q^  »£j)lBl]  _  cov2^  ,£1) 
cov*(£_  »£i)  cot2^G  ,£l) 


(20) 


The  numerator  of  the  first  term,  E[cov2£yk  ,£1)181],  can  be  simplified 
in  the  same  manner  as  in  the  case  of  ECU. 

E{cov2(yk  ,  Pi)l8i] 

"  n2  E<tj2i<7kJ  _  ~  Ti>l2  I  01) 

“  J  BI  -  Pi.)2<PiJ  -  Ti  )2  |  \) 


+4 

n* 


n.x'u  -  V<ptt.  -  V'V 
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Because  of  local  independence  and  Equation  (4),  we  obtain  the  following 
two  relatione: 

-  Pi.)2<piJ  "  Ti)2l©i) 

-  J^Oij2  +  (pij  “  Ti)2](Plj  -  T!)2 

and 

E[  Jh(ykj  -  Pi.Hyith  -  Pi.XPij  "  TiXPih  -  TiJleJ 
-AICpiJ  -  Ti)2(Pih  -  Ti>2|  9a3  . 

By  adding  the  results,  we  obtain 
Efcov2^  ,^)l0i) 

■  *  12,2)2  -  V2 

-  V«r2(P1J)  +  ^2  '  Tl)2  •  <21> 


By  substituting  (21)  in  (20),  we  get  Equation  (22),  the  variance  of  ECI4. 


Var  (ECl4|ei) 


cov2^  ,£)  ^  aij2(Pij 

cov2(^Pj) 


cov2& 

cov2^P^) 


^iJ2(PiJ  -  Tj)2 
n2cov2^  .^i) 


(22) 
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Comparison  of  Some  Statistical  Properties  of  the  Three  Indices 

ECU,  ECI2,  and  ECU 
Comparison  of  the  Standard  Errors 

The  conditional  expectations  of  the  three  indices  are  different  in 
a  manner  that  suggests  that  ECU  and  ECI2  are  similar  to  each  other, 
while  ECU  stands  alone.  ECU  and  ECI2  have  the  constant  expectation 
zero,  regardless  of  the  level  of  person  parameter  6^.  On  the  other  hand, 
the  expectation  of  ECU  is  a  function  of  6^,  as  shown  in  Figure  1  for 
the  dataset  obtained  from  a  32-item  signed-number  subtraction  test.  The 

Insert  Figure  1  about  here 

x-axis  represents  true  scores  and  the  y-axis  the  127  students'  expected 
ECU  values.  The  curve  in  Figure  1  decreases  monotonically  as  the  true 
score  decreases.  The  standard  error  of  ECU  is  the  square  root  of 
expression  (22)  and  is  also  a  function  of  0.  Figure  2  shows  the 
relationship  between  the  standard  error  and  the  true  scores.  (The 
estimated  true  score  of  IRT  was  used  instead  of  ©i  so  as  to  have  a  value 
between  0  and  1,  which  facilitates  comparison  across  different  tests.) 

Insert  Figure  2  about  here 

For  students  whose  true  scores  sre  extremely  high  or  low,  the  standard- 
error  curve  rises  sharply,  while  for  average  scores,  it  becomes  rather 
flat. 

Figures  3  and  4  are  plots  of  the  standard  errors  [square  roots  of 
expression  (11)  and  (16)]  of  ECU  and  ECI2  against  true  score  as  the  x- 
axls.  They  are  almost  identical  curves  that  are  nearly  horizontal  for 
the  average  true  scores  but  Increase  rather  rapidly  at  both  the  high  and 
low  extremes  of  true  scores. 
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Insert  Figures  3  &  4  about  here 

ECU  and  ECI2  correlate  highly  (r  -  .97,  see  Appendix  XI)  and  have 
the  same  constant  expectation  of  zero.  Moreover,  their  standard  errors 
have  almost  Identical  curves  when  plotted  against  true  scores,  so  we 
will  drop  ECI1  hereafter  and  make  comparisons  between  ECI2  and  ECI4. 
Since  ECI2  is  defined  by  using  the  elements  In  the  probability  matrix 
(plj)»  the  Investigation  of  ECI2  and  ECU  will  be  more  Interesting. 
Standardized  Extended  Caution  Indices,  ECI2Z  and  ECU*  and  their 
Density  Functions 

ECI8  can  be  standardized  by  subtracting  their  expected  values  and 
then  dividing  it  by  their  standard  errors.  Equations  (23)  and  (24)  are 
the  standardized  extended  caution  indices  ECI2  and  ECI4. 


ECI2Z 


ECI2  -  E(ECI2l8j)  „  ncov^  -_yjL  ,£) 

SE(ECI2|0i)  f 1L  2,„  ZlM 

l  E  a  ij  (Pij  -  T)  J 
j-1 


ECI4Z 


ECl4-E(ECI4|®i) 

SE(ECI4|©i) 


ncovt£l  -jfi  >r£l) 

-Tj)2)*4 


As  can  be  seen  in  Equations  (23)  and  (24),  the  second  variables  of  the 
covariances  in  the  numerators  are^and^P*,  respectively.  The 
denominator  for  ECI2c  involves  the  group-oriented  vector  G  -  T1  while 
that  for  ECI4Z  involves  the  individual-oriented  vector  at  the  level  i, 

-  T±l.  Tatsuoka  and  Linn  (1982)  argue  that  ECU  may  correspond  to  the 
individual  consistency  index  (ICI)  introduced  in  Tatsuoka  b  Tatsuoka 
(1980,  198%)  while  ECI2  may  function  similarly  to  the  group  dependent 


ices,  l.e.,  Sato's  caution  index  (1973)  or  the  norm  conformity  index 
tsuoka  &  Tatsuoka,  1980,  1982a).  The  ICI  has  proven  to  be  effective 
spotting  the  aberrant  response  patterns  resulting  from  consistent 
lication  of  erroneous  rules  of  operation  (TatBuoka  8  Tatsuoka,  1981). 

prediction  with  regard  to  detection  rates  of  erroneous  rules  of 
ration  is  that  ECI4  should  be  better  than  ECI2. 

It  should  be  noted  that  the  scale  of  the  original  ECIs  are 
ctions  of  6  but  those  of  the  standardized  ECIzs  no  longer  depend  on 
As  a  result,  two  ECI4Z  (or  ECX2Z)  values  obtained  from  different  © 
els  are  comparable  in  terms  of  the  extent  of  anomaly  they  signify, 
ever,  the  density  functions  of  ECX2Z  and  ECX4Z  have  to  be 
estigated  in  order  to  determine  their  differences  statistically, 
ures  5  and  6  show  the  gooduess-of-flt  test  of  the  normal  distribution 
Insert  Figures  5  &  6  about  here 

ECI2Z  and  ECI4Z.  Appendices  X  and  XI  give  the  tests  of  the  normal 
tribution  for  ECI1Z  and  lz  (Levine  &  Drasgow's  standardized 
ropriateness  measure,  1982),  while  Appendices  XXI,  XV  and  V  give  the 
dness-of-fit  tests  of  beta  distributions  for  ECI1Z,  ECI2Z,  and  KCX4Z. 

data  used  in  these  figures  are  based  on  2,400  students'  scores 
alned  from  a  math  test  (National  Assessment  of  Educational  Progess 
ies,  mathematics  for  13  year  olds.  Booklet  4).  A a  can  be  seen  in  the 
;ures,  both  the  standardized  ECIs  fit  normal  distributions  well, 
tllar  results  are  obtained  from  the  NAEP  data.  Booklet  5. 

Appendices  VII,  VIXX,  IX  and  X  give  the  standard  errors  of  ECI1Z, 
2Z,  and  ECI4Z  and  the  expectation  of  ECI4Z,  obtained  from  the  NAEP 
a.  Although  the  NAEP  data  is  used  for  testing  "goodness  of  fit"  of 
i  ECIs  with  theoretical  distributions,  we  will  go  back  to  the  signed 
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FIGURE  5!  Goodness  of  Fit  Test  for  the  Normol  Distribution! 

The  Stepfunction  is  a  Cummulative  Distribution  of  ECI1*z 
The  Smooth  Curve  is  o  Theoretical  Curve 
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number  data  In  order  to  Investigate  the  detection  rate  of  aberrant 
response  patterns  by  the  standardized  ECIs.  In  the  next  section,  a 
brief  description  of  the  dataset  and  procedure  for  the  comparlons  will 
be  described. 

A  brief  description  of  the  dataset 

Birenbaum  and  Tatsuoka  (1982a)  have  demonstrated  that  the 
traditional  zero-one  scoring  of  incorrect  and  correct  answers  does  not 
reflect  a  student's  performance  correctly  because  several  erroneous 
rules  frequently  yield  the  right  answer  for  some  problems.  By  extensive 
error  analysis  performed  on  the  original  dataset  (the  127  eighth  graders 
test  scores  for  signed-number  subtraction  problems)  Birenbaum  and 
Tatsuoka  (1980)  identified  erroneous  rules  that  were  consistently 
applied  by  certain  students.  They  rescored  ones  to  zeros  for  items  that 
students  got  right  for  the  wrong  reasons.  The  dataset  used  in  Figures  1 
through  4  are  the  modified  dataset  in  which  the  scores  of  zero-one 
should  reflect  more  accurately  the  student's  performance  than  the 
original  dataset  of  N  *  127.  The  modified  dataset  was  much  more  nearly 
unidimensional  and  had  higher  item-item  and  item-total  correlations 
than  the  original,  while  the  item-means  and  standard  deviation  remained 
almost  the  same  (Birenbaum  &  Tatsuoka,  1982a).  Fifteen  erroneous  rules 
were  randomly  selected  from  the  45  erroneous  rules  listed  in  Tatsuoka  & 
Tatsuoka  (1981)  and  responses  based  on  these  were  added  to  the  modified 
dataset.  We  refer  to  the  new  dataset  of  N  ■  142  as  "Bugdata”  hereafter. 
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Comparison  of  detection  rates  of  ECI2Z  and  ECI4Z  with  respect  to 
their  80Z  Interval s 

By  using  the  Item  parameters  estimated  from  the  modified  dataset, 
ECI2Z  and  ECI4Z  for  the  142  subjects  in  the  bugdataset  were  calculated 
and  plotted  against  the  true  scores.  Figure  7  is  the  scatterplot  of 
ECl4z  against  the  true  scores  and  Figure  8  is  £CI2Z  against  the  same 
true  scores.  The  15  bugs  are  marked  by  a  small  circle  "o“  with  the 
numbers  and  89  real  data  points  are  marked  by  a  plus  sign  "+"  without 
being  numbered. 

Insert  Figures  7  &  8  about  here 

The  80%  intervals  for  both  the  ECIs  and  lz  are  constructed  and 
listed  in  Table  1  along  with  the  means  and  standard  deviations  of  the 
indices.  These  are  the  intervals  within  which,  theoretically,  the 
values  of  the  indices  associated  with  80%  of  the  non-aberrant  responses 

Insert  Table  1  about  here 

should  fall.  The  intervals  are  marked  by  broken  lines  in  Figures  7  and 
8.  We  may  choose,  as  a  convenient  decision  rule,  to  classify  response 
patterns  with  index  values  outside  these  intervals  as  "aberrant."  The 
proportions  of  real  response  patterns  classified  as  "aberrant"  (which 
are  essentially  false  alarm  rates)  by  the  four  indices  that  are  shown  in 
Table  2  along  with  the  proportions  of  the  15  bugs  that  are  detected. 

Insert  Table  2  about  here 

The  unstandardized  ECI4  seemed  to  have  the  best  detection  rates  in 
comparison  with  the  other  four  ECIs  (Tatsuoka  &  Linn,  1982)  but  lost  its 
high  rate  after  it  was  standardized.  Exactly  the  same  dataset  is  used 
in  both  the  cases,  the  standardized  and  unstandardized  fourth  extended 
caution  index.  In  Table  2,  the  false  alarm  rates  of  the  four  indices 
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! 

Table  1 


The 

80%  Intervals 

of  ECU  , 

z 

ECI2_  ,  EC 14 

and  lz. 

Indices 

Mean 

S.D. 

80%  confidence  Interval 

ECI1 

2 

.001 

1.105 

(-1.414, 

1.416) 

EC  12 

z 

.020 

1.230 

(-1.555, 

1.594) 

EC  14 

z 

.019 

1.229 

(-1.554, 

1.593) 

lz 

.017 

.619 

(  -.775, 

.809) 
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Table  2 

Detection  Rates  of  Erroneous  Rules  by  Four 
Personal  Indices  Based  on  Item  Response  Theory 
with  Bugdataset 


Real  Students 

N  -  89 

Erroneous  Rules 

N  -  15 

ECU 

z 

.22 

.60 

ECI2Z 

.15 

.53 

ECI4Z 

.17 

.67 

lz 

.18 

.67 
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vary  around  20Z  as  chey  should,  while  the  correct  detection  rate 
fluctuates  around  b0%.  Considering  the  fact  that  the  false  alarm  rate 
for  the  89  students  by  using  1C1  with  total  scores  (1C1  >  .90  and  scores 
lower  than  a  certain  criterion,  Tatsuoka  &  Tatsuoka,  1981)  was  less  tbau 
5X,  the  results  summarized  in  Table  2  are  not  as  good  as  we  had 
expected.  One  reason  for  the  low  detection  rates  nay  be  the  fact  that 
the  modification  procedure  of  rescoring  in  the  original  dataaet  was 
carried  out  by  an  Intuitive  error  analysis,  and  hence  there  are  some 
responses  affected  by  persistent  misconceptions  left  in  the  modified 
dataset.  Table  3  lists  the  percentage  of  "bugs'*  left  in  the  modified 
dataset.  The  total  number  of  bugs  (including  repetitions)  has  become 
42.  The  mean  absolute  value  of  ECI4Z  in  the  two  groups  described  in 
Table  3  are  3.141  for  the  bugs  that  were  not  found  in  the  modified 
dataset,  1.353  for  the  bugs  left  in.  However,  the  value  of  £C14Z, 

1.353,  is  still  substantially  high  in  comparison  with  the  majority  of 
real  responses  in  the  modified  dataset. 

Insert  Table  3  about  here 

Summary  and  Discussion 

The  extended  caution  indices,  ECI1,  ECI2  and  ECI4  are  standardized 


by  the  usual  transformation, 

ECIm  -  E(ECIm|ei) 
ECImz  - _ _ 

SE(EClmjei) 


for  m-1,  2,  and  4. 


The  conditional  expectation  of  ECI4i  is  a  function  of  the  8  level,  but 
those  of  the  other  two  ECIs  are  identically  zero.  If  we  sample  two 
students  from  different  Oj.  levels,  then  it  is  dangerous  to  compare  their 
ECI4  values  in  order  to  determine  which  student's  response  patterns  is 
more  aberrant  than  the  other.  Moreover,  the  standard  errors  of  all 
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Table  3 

Percentage  of  Each  Bug  that  was  not  Rescored  and  Remained 
in  the  November  Modified  Dataset  (n  -  8,  N  -  89)  356  Sets  of  Responses 


Total 

Bugs  %  Scores  *  ECI4 


1 

0 

4 

3.728 

3 

0 

3 

4.309 

4 

0 

2 

4.259 

8 

0 

6 

3.059 

10 

0 

3 

4.045 

12 

0 

2 

-1.247 

13 

0 

1 

1.338 

2 

.006 

6 

2.554 

5 

.011 

5 

-1.435 

6 

.014 

6 

-2.197 

7 

.003 

4 

.631 

9 

.008 

1 

-.887 

11 

.014 

1 

1.084 

14 

.014 

6 

1.162 

15 

.048 

7 

.876 

*Mean  of  Group  1  -  3.141  S.D.  ■  .503 
Mean  of  Group  2  *  1.353  S.D.  ■  .240 
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three  ECIs  are  functions  of  8^  and  have  U  shaped  trend  curves.  This 
explains  the  past  findings  that  the  correlation  of  personal  Indices, 
such  as  the  caution  Index,  NCI, or  1CI,  with  total  scores  vary  according 
to  the  shapes  of  the  total-score  distributions.  The  findings  are  that 
If  the  total-score  distribution  has  a  negative  skewness,  then  the 
correlation  is  positive,  if  the  distribution  is  positively  skewed,  then 
a  negative  correlation  results  (Harnisch  &  Linn,  1981;  Tatsuoka  & 
Tatsuoka,  1980).  Since  the  ECIs  are  natural  extentlons  of  the  caution 
index,  we  can  safely  impute  some  behaviors  of  ECIs  to  these  discrete 
personal  indices  as  well.  ECIs  provide  inflated  values  at  both  the 
extremely  high  and  low  total  scores.  With  the  standardised  ECIs,  the 
bias  of  the  values  at  the  extreme  scores  is  corrected,  and  moreover  the 
responses  from  different  levels  of  9  can  be  compared  safely. 

It  would  be  ideal  if  the  theoretical  distribution  of  the 
standardized  extended  caution  indices  could  be  derived  algebraically, 
but  goodnes-of-flt  tests  of  the  ECIcs  with  normal  distributions  provide 
satisfactory  evidence  that  they  may  follow  approximately  normal 
distributions. 

Regarding  the  detection  rates  of  "bugs",  they  are  unexpectedly  low. 
We  have  tried  to  find  the  reasou  for  this  by  investigating  each  response 
pattern  in  the  modified  dataset.  The  results  indicate  that  if  an 
otherwise  normal  dataset  includes  a  considerable  number  of  aberrant 
response  patterns,  then  these  patterns  are  no  longer  detectable  with 
high  probability  by  the  ECI  approach.  A  new  method  to  detect  such 
aberrant  response  patterns  should  be  investigated  in  the  future. 


Rudner  (1982)  recently  conducted  a  Monte  Carlo  study  to  compare  the 
detection  rates  of  various  Indices,  lie  found  that  the  Indices  based  on 
Item  response  theory  performed  consistently  better  with  his  data  than 
the  Indices  based  on  sample  statistics  alone.  But  1RT  is  not  always 
applicable  in  practice.  An  advantage  of  ECIs  in  comparison  with  other 
appropriateness  indices  or  Wright's  index  is  that  they  can  start  from 
the  caution  index  when  a  sample  is  small.  Then  it  can  be  shifted  to 
ECIs  as  the  sample  size  becomes  larger  without  loss  of  continuity 
because  ECIs  are  natural  extentions  of  the  S-P  curve  theory.  However, 
further  investigation  of  the  relationships  between  the  original  caution 
index  and  the  ECIs  will  be  needed. 
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Captions  of  Appendices 

Appendix  I:  Goodness  of  Fit  Test  for  the  Normal  Distribution:  The 
Stepfunction  is  the  Cummulative  Distribution  of  ECIl^ 

Appendix  II:  Goodness  of  Fit  Test  for  the  Normal  Distribution:  The 
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